Comparison of Parallelisation Approaches, Languages, and Compilers for Unstructured Mesh Algorithms on GPUs
نویسندگان
چکیده
Efficiently exploiting GPUs is increasingly essential in scientific computing, as many current and upcoming supercomputers are built using them. To facilitate this, there are a number of programming approaches, such as CUDA, OpenACC and OpenMP 4, supporting different programming languages (mainly C/C++ and Fortran). There are also several compiler suites (clang, nvcc, PGI, XL) each supporting different combinations of languages. In this study, we take a detailed look at some of the currently available options, and carry out a comprehensive analysis and comparison using computational loops and applications from the domain of unstructured mesh computations. Beyond runtimes and performance metrics (GB/s), we explore factors that influence performance such as register counts, occupancy, usage of different memory types, instruction counts, and algorithmic differences. Results of this work show how clang’s CUDA compiler frequently outperform NVIDIA’s nvcc, performance issues with directive-based approaches on complex kernels, and OpenMP 4 support maturing in clang and XL; currently around 10% slower than CUDA.
منابع مشابه
Investigating the Effects of Hardware Parameters on Power Consumptions in SPMV Algorithms on Graphics Processing Units (GPUs)
Although Sparse matrix-vector multiplication (SPMVs) algorithms are simple, they include important parts of Linear Algebra algorithms in Mathematics and Physics areas. As these algorithms can be run in parallel, Graphics Processing Units (GPUs) has been considered as one of the best candidates to run these algorithms. In the recent years, power consumption has been considered as one of the metr...
متن کاملSemi-automatic Parallelisation of Unstructured Mesh Codes Using Domain Decomposition
In this paper we discuss enhancements to a suite of semi-automatic parallelisation tools to enable unstructured mesh (irregular) computational mechanics (CM) codes to be rapidly parallelised using SPMD domain decomposition techniques. This work draws upon the dependence analysis and code generation techniques that were originally developed for structured mesh (regular) FORTRAN codes and have be...
متن کاملIntegrated flow and stress using an unstructured mesh on distributed memory parallel systems
Domain decompositionmethods can be successfully applied to the parallelisation of existing unstructured mesh computational mechanics codes. Such codes tend to be large and so a structured approach to their parallelisation is required. Algorithmic modification of order dependant iterative solvers is inevitable, but shown to be of little consequence. A well balanced mesh partition may be demonstr...
متن کاملModelling Continuum Mechanics Phenomena using Three Dimensional Unstructured Meshes on Massively Parallel Processors
Unstructured mesh codes for modelling continuum physics phenomena have evolved to provide the facility to model complex interacting systems. Such codes have the potential to provide a high performance on parallel platforms for a small investment in programming. Single Program Multi Data (SPMD) domain decomposition techniques have been demonstrated to provide the required parameters of high para...
متن کاملImproving Locality of Unstructured Mesh Algorithms on GPUs
To most efficiently utilize modern parallel architectures, the memory access patterns of algorithms must make heavy use of the cache architecture: successively accessed data must be close in memory (spatial locality) and one piece of data must be reused as many times as possible (temporal locality). In this work we analyse the performance of unstructured mesh algorithms on GPUs, specifically th...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2017